Search CORE

18 research outputs found

A new hybrid metric for verifying parallel corpora of Arabic-English

Author: Alkahtani Saad
Liu Wei
Teahan William J.
Publication venue
Publication date: 12/02/2015
Field of study

This paper discusses a new metric that has been applied to verify the quality in translation between sentence pairs in parallel corpora of Arabic-English. This metric combines two techniques, one based on sentence length and the other based on compression code length. Experiments on sample test parallel Arabic-English corpora indicate the combination of these two techniques improves accuracy of the identification of satisfactory and unsatisfactory sentence pairs compared to sentence length and compression code length alone. The new method proposed in this research is effective at filtering noise and reducing mis-translations resulting in greatly improved quality.Comment: in CCSEA-201

arXiv.org e-Print Archive

CiteSeerX

Crossref

An automatic cryptanalysis of simple substitution ciphers using compression

Author: Alkazaz Noor R.
Irvine Sean A.
Teahan William J.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

Bangor University Research Portal

Action selection through affective states modelling

Author: Headleand Christopher J.
Teahan William
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2016
Field of study

We introduce an action selection framework for the advanced behavioural animation of virtual creatures. In modern creative media, the behavioural animation of characters which act in a believable fashion is an ongoing challenge. Traditional action selection approaches which attempt to make an agent act rationally often fall short of the believability required for the modern consumer. Often the most believable action is not the most rational one, and our judgement of an agent's behaviour may also be based on the perception of its personality. Our approach, Affective Spaces Modelling, addresses these issues by creating a multi-dimensional environment constructed of aspect dimensions, with each aspect dimension representing a linear scale of a single component of the agent's internal state. Affective states can then be modelled by placing them in a single point in this environment. As the agent's state changes within the affective state space, different affects trigger appropriate actions. We demonstrate through a case study how the technique can be used to simulate different types of agent behaviour, operating both individually and as part of a group. Our case studies focus on groups of agents, allowing for the direct comparison of different personalities and examples of behavioural phenomena

University of Lincoln Institutional Repository

Towards ethical robots: revisiting Braitenberg's vehicles

Author: Headleand Christopher J.
Teahan William
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2016
Field of study

The development of software and machines capable of making ethical judgements is a topic of great interest with both the research communities and the public. Debates over the possibility and practicality of such systems have only intensified with the increased use of robotics in the military arena and the ubiquity of AI in commercial products. Modern innovations, such as the driverless car, will likely make artificial ethical agents a legal necessity. As a research field, it has received relatively little attention compared to other, more traditional, AI problems. In this paper, we propose a bottom-up reactive system that provides one possible solution. We will begin by describing the motivation to this work: the development of artificial ethical agents could both mitigate some fears about the future of autonomous AI, and providing insight into human moral reasoning. We then explore the related work, including the current attempts at simulating ethics. We describe our novel approach to ethical simulation, Vessels; a Braitenberg Vehicle inspired reactive agent approach. We, then, demonstrate how Vessels can be configured to simulate both Egoism and Altruism, comparing our simulations to the normative theory

University of Lincoln Institutional Repository

A Repetition Based Measure for Verification of Text Collections and for Text Categorization

Author: Dmitry V. Khmelev
William J. Teahan
Publication venue
Publication date: 01/01/2003
Field of study

We suggest a way for locating duplicates and plagiarisms in a text collection using an R-measure, which is the normalized sum of the lengths of all suffixes of the text repeated in other documents of the collection. The R-measure can be effectively computed using the suffix array data structure. Additionally, the computation procedure can be improved to locate the sets of duplicate or plagiarised documents. We applied the technique to several standard text collections and found that they contained a significant number of duplicate and plagiarised documents. Another reformulation of the method leads to an algorithm that can be applied to supervised multi-class categorization. We illustrate the approach using the recently available Reuters Corpus Volume 1 (RCV1). The results show that the method outperforms SVM at multi-class categorization, and interestingly, that results correlate strongly with compression-based methods

CiteSeerX

Grammar Based Pre-Processing for PPM

Author: Nojood O. Aljehane
William J. Teahan
Publication venue
Publication date: 21/12/2022
Field of study

In this paper, we apply grammar-based pre-processing prior to using the Prediction by Partial Matching (PPM) compression algorithm. This achieves significantly better compression for different natural language texts compared to other well-known compression methods. Our method first generates a grammar based on the most common two-character sequences (bigraphs) or three-character sequences (trigraphs) in the text being compressed and then substitutes these sequences using the respective non-terminal symbols defined by the grammar in a pre-processing phase prior to the compression. This leads to significantly improved results in compression for various natural languages (a 5% improvement for American English, 10% for British English, 29% for Welsh, 10% for Arabic, 3% for Persian and 35% for Chinese). We describe further improvements using a two pass scheme where the grammar-based pre-processing is applied again in a second pass through the text. We then apply the algorithms to the files in the Calgary Corpus and also achieve significantly improved results in compression, between 11% and 20%, when compared with other compression algorithms, including a grammar-based approach, the Sequitur algorithm

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY